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Nonparametric methods have been very popular in the last cou- 
ple of decades in time series and regression, but no such develop- 
ment has taken place for spatial models. A rather obvious reason 
for this is the curse of dimensionality. For spatial data on a grid 
evaluating the conditional mean given its closest neighbors requires a 
four-dimensional nonparametric regression. In this paper a semipara- 
metric spatial regression approach is proposed to avoid this problem. 
An estimation procedure based on combining the so-called marginal 
integration technique with local linear kernel estimation is developed 
in the semiparametric spatial regression setting. Asymptotic distribu- 
tions are established under some mild conditions. The same conver- 
gence rates as in the one-dimensional regression case are established. 
An application of the methodology to the classical Mercer and Hall 
wheat data set is given and indicates that one directional component 
appears to be nonlinear, which has gone unnoticed in earlier analyses. 

1. Introduction. Data collected at spatial sites occur in many scientific 
disciplines, such as econometrics, environmental science, epidemiology, im- 
age analysis and oceanography Often the sites are irregularly positioned, 
but, with the increasing use of computer technology, data on a regular grid 
and measured on a continuous scale are becoming more and more common. 
This is the kind of data that we will be considering in this paper. 

In the statistical analysis of such data, almost exclusively, the emphasis 
has been on parametric modeling. So-called joint models were introduced in 
the papers by Whittle [36, 37], but, after the ground breaking paper by Besag 
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[1], the literature has been dominated by conditional models, in particular, 
with the use of Markov fields and Markov chain Monte Carlo techniques. 
Another large branch of literature, mainly on irregularly positioned data, 
though, is concerned with the various methods of kriging which are based 
on parametric asumptions; see, for example, [6], Chapters 2-5. 

In time series and regression, nonparametric methods have been very 
popular both for prediction and characterizing nonlinear dependence. No 
such development has taken place for spatial lattice models. Since the data 
are already on a grid, unless there are missing data, the prediction issue is 
less relevant, but there is still a need to explore and characterize nonlinear 
dependence relations. A rather obvious reason for the lack of progress is the 
curse of dimensionality. For a time series {If}, a nonparametric regression 
-E[Yi|Yj„i = y] of It on its immediate predecessor is one-dimensional, and 
the corresponding Nadaraya-Watson (NW) estimator has good statistical 
properties. For spatial data {Yij} on a grid, however, the conditional mean 
of given its closest neighbors Yi-ij, iij-i, and ^ij+i involves a 

four-dimensional nonparametric regression. Formally this can be carried out 
using the NW estimator, and an asymptotic theory can be constructed. In 
practice, however, this cannot be recommended unless the number of data 
points is extremely large. 

In spite of these difficulties, there has been some recent theoretical work in 
this area. Kernel and nearest neighbor density estimates have been analyzed 
by Tran [33] and Tran and Yakowitz [34] under spatial mixing conditions. 
Clearly, in the marginal density estimation case, the curse of dimensionality 
is not an obstacle. The L\ theory was established by Carbon, Hallin and 
Tran [4], and developed further by Hallin, Lu and Tran [15] under spatial 
stability conditions, including spatial linear and nonlinear processes, without 
imposing the less verifiable mixing conditions. The asymptotic normality of 
the kernel density estimator was also established for spatial linear processes 
by Hallin, Lu and Tran [14]. Finally, the NW kernel method and the local 
linear spatial conditional regressor were treated by Lu and Chen [21, 22], 
Hallin, Lu and Tran [16] and others. We have found these papers useful in 
developing our theory, but our perspective is rather different. 

There are several ways of circumventing the curse of dimensionality in 
nonspatial regression. Perhaps the two most commonly used are semipara- 
metric models, which in this context will be taken to mean partially linear 
models, and additive models. Actually, Cressie ([6], page 283) points out the 
possibility of trying such models for spatial data, noting that the nonlinear 
krige technique called disjunctive kriging (cf. [29]) takes as its starting point 
an additive decomposition. The problem, as seen from a traditional Markov 
field point of view, is that additivity clashes with the spatial Markov as- 
sumption. This is very different from the time series case where the partial 
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linear autoregressive model (see [9]) 



Y t = f3Y t „ 1 + g(Y t _ 2 ) + e t 



is a Markov model of second order if jet} consists of independent and iden- 
tically distributed (i.i.d.) random errors independent of {Y t - S ,s > 0}. 

In the spatial case so far we have not been able to construct nonlinear 
additive or semiparametric models which are at the same time Markov. The 
problem can be illustrated by considering the line process {Y{\. Assuming 
{Yi} to be Markov on the line and conditional Gaussian with density 



it is easily seen using formulae (2.2) and (3.3) of [1] that the Markov field 
property implies g(y) = h(y) = ay + b for two constants a and b. 

In ordinary regression, semiparametric and additive fitting can be thought 
of as an approximation of conditional quantities such as E\Yt\Yt-i, . . . , Yt-k], 
and sometimes [31] interaction terms are included to improve this approxi- 
mation. The approximation interpretation continues to be valid in the spatial 
case, so that semiparametric and additive models can be viewed as approxi- 
mations to conditional expressions such as E[Yij\Yi_i t j,Yij_i,Yi + ij,Yij + i\. 
The conditional spirit of Besag [1] is retained, being in terms of condi- 
tional means, however, rather than conditional probabilities. (Note that, 
also, in nonlinear time series, dependence is described by taking the con- 
ditional mean as a starting point; see, in particular, the contributions by 
Bjerve and Doksum [2] and Jones and Koch [19].) The conditional mean 
E[Yij\Yi_ij,Yij_i,Yi + ij,Yij + i], say, is meaningful if first-order moments 
exist and if the conditional mean structure is invariant to spatial transla- 
tions. Mathematically, the approximation consists in projecting this func- 
tion on the set of semiparametric or additive functions. It is not claimed 
that there is a Markov field model, or any other conditional model, that can 
be exactly represented by this approximation. In this respect the situation 
is the same as for nonlinear disjunctive kriging, where the conditional mean 
of Yij at a certain location is sought to be approximated by an additive de- 
composition going over all of the remaining observations (cf. [6], page 279). 
Classes of lattice models where there does exist an exact representation are 
the class of auto-Gaussian models (cf. [1]) or unilateral one-quadrant repre- 
sentations where Y^ is represented additively in terms of, say, YJj_i 
only and an independent residual term (cf. [23]). But the former is linear, 
and the latter a "causal" unilateral expansion which may not be too realistic. 
In general, in the nonlinear spatial case, one must live with the approxima- 
tive aspect. In practical time series modeling this is also the case, but in 
that situation at least one is able to write up a fairly general and exact 
model, where Y can be expressed as an additive function of past values and 



p(yi\yi-i,yi+i) 



i 



e -(yi-g(yi-i)-h(y i+1 )) 2 /(2<t 2 ) 
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an independent residual term. Fortunately, the asymptotic theory does not 
require the existence of such a representation. 

The purpose of this paper is then to develop estimators for a spatial semi- 
parametric (partially linear) structure and to derive their asymptotic prop- 
erties. In the companion paper by Lu et al. [23], the additive approximation 
is analyzed using a different setup and different techniques of estimation. An 
advantage of using the partially linear approach is that a priori information 
concerning possible linearity of some of the components can be included in 
the model. More specifically, we will look at approximating the conditional 
mean function m(Xij, Z^) = E(Yij\Xij, Z^) by a semiparametric (partially 
linear) function of the form 

(1.1) m (X ij ,Z ij ) = fi + ZTj/3 + g{Xij), 

such that E[Yij — mo(Xij , Zij)] 2 or, equivalently, E[m(Xij, Z^) —mo(Xij , Zij)] 
is minimized over a class of semiparametric functions of the form mo(Xij, Z^), 
subject to E[g(Xij)] =0 for the identifiability of mo(Xij , Zij) , where (i 
is an unknown parameter, /3 = (J3±, . . . , (3 q ) T is a vector of unknown pa- 
rameters, g(-) is an unknown function over W, Z^ = (zjj , . . . , Z^) T and 
Xij = (X^j , . . . ,X^) T may contain both exogenous and endogenous vari- 

(r) 

ables, that is, neighboring values of Yij. Moreover, a component Z\a' of Z^ 
or a component -XL of X^ may itself be a linear combination of neighboring 
values of Yij, as will be seen in Section 4, where Zjj = Yi-i,j + and 

Motivation for using the form (1.1) for nonspatial data analysis can be 
found in [17]. As for the nonspatial case, estimating g(-) in model (1.1) may 
suffer from the curse of dimensionality when g(-) is not necessarily additive 
and p > 3. Thus, we will propose approximating g(-) by g a {-), an additive 
marginal integration projector as detailed in Section 2 below. When g(-) 
itself is additive, that is, g(x) =J2f=i9l( x l)j m oPQj, Zij) of (1.1) can be 
written as 

(1.2) mo(Xij,Zij) =fi + ZJjP + ]T gi(xV), 

i=i 

subject to E[gi(X^)] = for all 1 < / < p for the identifiability of mo(Xij, Zij) 
in (1.2), where <#(•), I = 1, . . . ,p, are all unknown one-dimensional functions 
over Mr. 

Our method of estimating g(-) or g a { ) is based on an additive marginal 
integration projection on the set of additive functions, but where, unlike 
the backfitting case, the projection is taken with the product measure of 
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X^j for 1 = l,...,p (cf. [27]). This contrasts with the smoothed backfit- 
ting approach of Lu et al. [23], who base their work on an extension of the 
techniques of Mammen, Linton and Nelson [24] to the nonparametric spa- 
tial regression case. Marginal integration, although inferior to backfitting in 
asymptotic efficiency for purely additive models, seems well suited to the 
framework of partially linear estimation. In fact, in previous work (cf. [8]) 
in the independent regression case marginal integration has been used, and 
we do not know of any work extending the backfitting theory to the par- 
tially linear case. Marginal integration techniques are also applicable to the 

(k) 

case where interactions are allowed between the X) ■ -variables (cf. also the 
use of marginal integration for estimating interactions in ordinary regression 
problems). 

We believe that our approach to analyzing spatial data is flexible. It 
permits nonlinearity and non-Gaussianity of real data. For example, re- 
analyzing the classical Mercer and Hall [26] wheat data set, one directional 
component appears to be nonlinear, and the fit is improved relative to ear- 
lier fits that have been linear. The presence of spatial dependence creates 
a host of new problems and, in particular, it has important effects on the 
estimation of the parametric component with asymptotic formulae different 
from those in the time series case. 

The organization of the paper is as follows. Section 2 develops the kernel 
based marginal integration estimation procedure for the forms (1.1) and (1.2). 
Asymptotic properties of the proposed procedures are given in Section 3. 
Section 4 discusses an application of the proposed procedures to the Mercer 
and Hall data. A short conclusion is given in Section 5. Mathematical details 
are relegated to the Appendix. 

2. Notation and definition of estimators. As mentioned after (1.1), we 
are approximating the mean function m(Xij,Zij) = E[Yij\Xij , Zij] by mini- 
mizing 

E[Y t3 - m (X t ,,Z tl )] 2 = E[Yij - » - ZTjP - 5 (A^)] 2 

over a class of semiparametric functions of the form mo{Xij,Zij) = fj, + 
ZJj[3 + g(Xij) with E\g(Xij)\ = 0. Such a minimization problem is equivalent 
to minimizing 

E [ Yij -fi- Zl 3 (i - giXij)} 2 = E[E{(Yij - /i - Z( jf 3 - g(Xij)) 2 \Xij}] 

over some (fi,(3,g). This implies that g{Xij) = E[(Yij — fi — Z[jf3)\Xij] and 
fi = E[Yij — ZJj/3], and (3 is given by 

P = (EHZi, - EiZijlXij}^ - E[Zij\X i j]) T ])~ 1 
x E\{Zij - ElZ^Xi^Yij - EpijlXij])], 
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provided that the inverse exists. This also shows that mo(Xij, Zij) is iden- 
tifiable under the assumption of E\g(Xij)} = 0. 

We now turn to estimation assuming that the data are available for 
(Yij , Xij , Zy) for 1 < i < m, 1 < j < n. Since nonparametric estimation is 
not much used for lattice data, and since the definitions of the estimators 
to be used later are quite involved notationally, we start by outlining the 
main steps in establishing estimators for fi, (3 and g(-) in (1.1) and then 
gi(-),l = 1,2,..., p, in (1.2). In the following, we give our outline in three 
steps. 

Step 1. Estimating fi and g(-) assuming f3 to be known. 

For each fixed j3, since fi = E[Yij] — E[Z[jf3] = \iy — ^ T z (3, \x can be es- 

timated_by ft((3) = Y - IT/3, where \i Y = E[Yij], \i z = (Pz ■ ■ >/4? T = 
E[Zij], Y = ^ ET=i £$U Yij and Z = ± YT=i E?=i ^ • 
Moreover, the conditional expectation 

g(x) = g(x, 0) = E[(Yij — fi — Z^X^ = x] 

= E[(Yij - E[Yij] - (Z^ - E[Zij]YP)\Xij = x] 

can be estimated by standard local linear estimation ([7], page 19), with 
9m,n( x 'P) = <k)(P) satisfying 

(2.1) m n 

= arg min ^ (Yj - ZJjfi -a -a[ (X^ - x) ) Kij (x,b), 

(ao.aijGM 1 xRp i=1 j =1 

where %j = Yj - Y and Z tj = (z\f , . . . , z\ff = Z tJ - Z. 

Step 2. Marginal integration to obtain g±, . . . ,g p of (1.2). 
The idea of the marginal integration estimator is best explained if g(-) is 
itself additive, that is, if 

g{X l3 )=g(X^,...,X^)=j2g l (X^). 

1=1 

Then, since E[gi(xfj)] = for I = 1, . . . ,p, for k fixed, 
g k (x k ) = E[g(x\)\...,x k ,...,x\f)] 

(k) 

and an estimate of g k is obtained by keeping X- ' fixed at x k and then taking 

the average over the remaining variables X^ , x\^ l ' , x\^ +l ^ , . . . , x\^ . 
This marginal integration operation can be implemented irrespective of 
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whether or not g(-) is additive. If the additivity does not hold, as men- 
tioned in the Introduction, the marginal integration amounts to a projection 
on the space of additive functions of xf- \ I = 1, . . . ,p, taken with respect 
to the product measure of X^,l = l,...,p, obtaining the approximation 

g a (x,f3) = J^i = iPi,uj(X^ , f3), which will be detailed below with (3 appear- 
ing linearly in the expression. In addition, it has been found convenient to 
introduce a pair of weight functions (u>k, w^_^) in the estimation of each 
component, hence, the index w in Pi )W . The details are given in (2.7)~(2.9) 
below. 



Step 3. Estimating (3. 

The last step consists in estimating (3. This is done by weighted least 
squares, and it is easy since (3 enters linearly in our expressions. In fact, us- 
ing the expression of g(x,f3) in step 1, we obtain the weighted least squares 
estimator f3 of (3 in (2.10) below. Finally, this is re- introduced in the expres- 
sions for ft and P resulting in the estimates in (2.11) and (2.12) below. In 
the following, steps 1-3 are written correspondingly in more detail. 



Step 1. To write our expression for (do(/3), Si(/3)) in (2.1), we need to 

X {l) -xi 

introduce some more notation. Let Kij = Kij(x,b) =nf=i^( — ^ — )> with 
b = b m , n = (&i, . . . , bp), b[ = b[, m ^ n being a sequence of bandwidths for the Zth 
covariate variable x\ j , tending to zero as (m,n) tends to infinity, and K(-) 
is a bounded kernel function on IR 1 (when we do the asymptotic analysis in 
Section 3, we need to introduce a more refined choice of bandwidths, as is 
explained just before stating Assumption 3.6). Denote 



Xij — Xijix, b) 
and let b w = Ylf =1 h. We define 



V h b p 



u m ,n,hh = (rnnb n ) 1 ^2^2(X ij (x,b)) h (X ij (x,b)) h K ij (x,b), 

(2.2) 1=13=1 

0<h,l 2 <P, 

where {Xij{x,b))i = (xf) - x{)/bi for 1 < I <p. We then let (Xij(x,b)) = 1 
and define 

m n 

(2.3) v m ^{l3) = (mny- 1 ^^^ - ZJ^X^x^Ki^b) 

i=ij=i 

and where, as before, Yij = Yij — Y and Zij = Zij — Z . 
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Note that u m , n ^(/3) can be decomposed as 
(2.4) v m , n M = v® n)l - £ fcv^ for I = 0, 1, . . . ,p, 

s=l 

in which 



(0) _ .,(0) 



7 J oj 

171,71,1 m 



(mnbir) 1 ^2^Yij (X ij (x,b)) l K ij (x,b), 
i=ij=i 



w _ » 



m,n,t 



U x > b ) 



i=ij=i 

We can then express the local linear estimates in (2.1) as 
(2-5) {a {/3)MP) © b) T = U~] n V m> M, 

where is the operation of the component-wise product, that is, a\ b = 
(an&i, . . . ,ai p 6p) for a x = (an, . . . ,a lp ) and 6= (bi,...,b p ), 

rr -( 

V^m^^iPj/ V^m.,n,10 <-V,n,ll ) 

where £/" m , n ,io = ^m,n,oi = ( u m,n,oi, ■ • ■ ,« m ,n,0p) r and {7 m ,„,ii is the p xp ma- 
trix defined by ii m ,n,M 2 , with Zi,Z 2 = 1, • • • ,P, in (2.2). Moreover, F m ,n,i(/5) = 
) ^m,n,p(/?)) r j with f m ,n,z(/$) as defined in (2.3). Analogously 
for Vm.ru we may define Vm ^ and Vm)n in terms of uJJJn and Vm] n . Then 
taking the first component with 7 = (1, 0, ... , 0) r G 

g m , n {x,(3) = -f T U~l n (x)V mtn {x,P) 

= l T U^ n (x)Vi%(x)-j2Psl T U^ n (x)Vi%(x) 

s=l 

= ^■mlni x ) ~ @ H m j l {x), 

where H m>n {x) = (H$# n (x), H$ n (x)) T , with H$ n (x) = f^W^W 
1 < s < q. Clearly, Hm,n(x) is the local linear estimator of H^(x) = E[(Z^ — 
H { $)\X i j=x], l<s<q. 

We now define z\f = Yij and [if = jUy such that H^\x) = E[{z\f - 

vP^Xij =x]= E\Yij - hy\Xh = x) and H(x) = (H^Kx), . . .,H^(x)) T = 
E[(Zij - Hz)\Xij = x]. It follows that g(x,/3) = HW(x) - & T H{x), which 
equals g(x) under (1.1) irrespective of whether g itself is additive. 
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Step 2. Let wr_ k -\(-) be a weight function denned on R p_1 such that 

E[w(_ k ){x\j k) )] = 1, and w k {x k ) = I[-L k ,L k ] i x k) defined on R 1 for some 
large L k > 0, with 

r (-k) _ (Y (l) Y (k-i) Y (k+i) Y (p)s 
ij \ ij ' ' ' ' ' ij ' ij i ■ • ■ i ij ) i 

where Ia( x ) is the conventional indicator function. 
For a given 0, consider the marginal projection 

P k ,w( x k, 0) = E[g{x\f ,...,x\^ l \x k , 

(2.7) 

X% +1 \...,X%\p)w { _ k) (X^)]w k (x k ). 

It is easily seen that if g is additive as in (1.2), then, for — L k <x k < L k , 
Pk,w(xk,f3) = 9k(xk) up to a constant since it is assumed that E[w^_ k ) 
1. In general, g a (x,(3) =Y%=iPl,w(%h0) is an additive marginal projec- 
tion approximation to g(x) in (1.1) up to a constant in the region x G 
Ilf=i [~Li,Li] . The quantity P k , w (x k , /3) can then be estimated by the spatial 
locally linear marginal integration estimator 

m n 

P~k,w(xk,f3) = (rnn)~ 1 J2J29m,n{ X ij\--^ X ij~ 1 \ x k, 
i=ij=i 



(2-8) +1) , . . . , JC« , /3)«;(-*) (A^~ fc) )«, fc 

8=1 

where 

-, m n 



5 -^fe) 



' = 1.7 = 1 

Af +1 \...,X^) W (- fc )(^ fc V fc (^) 

is the estimator of 

p&(x fc )=^ ) (4- ) ."-.4*" 1) . a; *. 

xJ +1) ,...,x^v ( _ fe) (4- fc ))K(x fe ), 

for < s < g, and P k z w (x k ) = {P^l(x k ), Pi%{x k )) T is estimated by 

P k Z w {x k ) = {Pi%x k ),...,Pi%{x k )Y. 

Here, we add the weight function w k (x k ) = I[~L k ,L k ]( x k) m the definition 
of P/fljixk), since we are only interested in the points of x k G [— L k ,L k ] for 
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some large L k . In practice, we may use a sample centered version of Pjflj(xk) 
as the estimator of P^l(xk)- Clearly, we have Pk,w{xk,(3) = Pk%( x k) ~ 
f3 T Pk w i x k)- Thus, for every /?, g(x) = g(x,(3) of (1.1) [or rather the ap- 
proximation g a (x,(3) if (1.2) does not hold] can be estimated by 

(2.9) f(x, /3) = J2 PtA*i>P) = E P S ^ ~ F E ^(xi). 

1=1 1=1 1=1 

Step 3. We can finally obtain the least squares estimator of (3 by 

m n 

\2 



(2.10) 



= arg min ]T £ (F„ - - g{X^ ,0))' 
i=ij=i 

m n 

argmin^^(y4.-(Z*.) T /3) 2 , 
i=ij=i 



where = Y t] - £f =1 (xf ) and Z>- = Zy - Ef =1 ) • Therefore, 

/ m n \~ 1 / m n \ 

(2.ii) /3= EE %-(%) T EE and a = f - ^z. 

\i=l i=l / V i=lj=l / 

We then insert (5 in ao(/3) = g m , n (x,P) to obtain ao(/3) = g m ,n{x, (3). In 
view of this, the spatial local linear projection estimator of Pk(xk) can be 
defined by 



Pk,w( X k) = ( mn ) 1 ^2J29m,n{ X ij^--^ X ij l \ x k, 



-(1) y{k-l) 
'ij i ■ • ■ i ^ij i M Ki 

i=lj=l 

(2.12) 

and for x k G [— L^L^], this would estimate gk{xk) up to a constant when 

(1.2) holds. To ensure E\gk(x\j)\ = 0, we may rewrite Pk,w{xk) — Ap(^) f° r 

the estimate of g k (x k ) in (1.2), where fi P {k) = ^ E^i EjU PkA X if)- 

For the least squares estimator, (3, and Pk,w(~), we establish some asymp- 
totic distributions under mild conditions in Section 3. 

3. Asymptotic properties. Let I m .. n be the rectangular region defined 
by l m< n = {(hj)-hj GZ 2 ,l<K?n,l<j< re}. We observe {(Yy, X ij: Z^)} 
on X m>n with a sample size of mn. 

In this paper we write (m,re) — > oo if 



(3.1) 



min{?re, re} — ► oo. 
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In [33] it is required, in addition, that m and n tend to infinity at the same 
rate: 

(3.2) C\ < \m/n\ < C 2 for some < C\ < C 2 < oo. 

Let {(Yij , Xij , Zij)} be a strictly stationary random field indexed by (i,j) G 
Z 2 . A point in Z 2 is referred to as a site. Let S and S' be two sets of sites. 
The Borel fields B(S) = «i V <; . .V^. Z, ; . G S) and 5(5') = B{Y ij ,X ij , Zy, 
G 5') are the cr-fields generated by the random variables (Y^ , Xij , Zij) , 
with being elements of S* and 5', respectively. We will assume that 

the variables (Yij, X^, Zij) satisfy the following mixing condition (cf. [33]): 
There exists a function <p(t) [ as t — > oo, such that, whenever S, S' C Z 2 , 

a(S(5),B(,S , ))= sup {\P(AB) - P(A)P(B)\} 

{A&B(S),B£B{S')} 

(3-3) 

< /(Card(5),Card(5'))¥?(rf(5, S')), 

where Card(S') denotes the cardinality of S, and d is the distance defined 
by 

d(S, S') = min{yf\i-i'\2 + \j-j'\2:(i,j) G S, (»',/) G 5'}. 

Here / is a symmetric positive function nondecreasing in each variable. 
Throughout the paper, we only assume that / satisfies 

(3.4) f(n, m) < min{m, n}. 

If / = 1, then the spatial process {(Yij,Xij,Zij)} is called strongly mixing. 
Condition (3.4) holds in many cases. Examples can be found in [30]. For 
relevant work on random fields, see, for example, [3, 5, 12, 13, 20, 28, 32, 35]. 

To state and prove our main results, we introduce the following assump- 
tions. 

Assumption 3.1. Assume that the process {(Yij, X^, Zy) : (i,j) G Z 2 } 
is strictly stationary. The joint probability density f s (x\, . . . , x s ) of (Xi 1 j 1 , 
. . . , Xi s j s ) exists and is bounded for s= 1, . . . , 2r — 1, where r is some positive 
integer such that Assumption 3.2(h) below holds. For s = 1, we write f(x) 
for fi(x\), the density function of X^. 

Assumption 3.2. (i) Let Z*. = Z {j -p z - Ef=i p i%( x fj) and B zz = 

E[Z{ X (Z\ x ) T \ . The inverse matrix of B zz exists. Let Y* = Y {j -\iy- Ef =1 P$ x 

(aJ ] ) and Rij = Z,M V,) - Z* T (3). Assume that the matrix S B = E£-oo x 
Ej=_oo ^[(^oo - ^s)(Rij ~ Hb) t \ is finite. 

(ii) Suppose there is some A > 2 such that .E[|Yjj| Ar ] < oo for r as defined 
in Assumption 3.1. 
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Assumption 3.3. The mixing coefficient (p denned in (3.3) satisfies 

00 

(3.5) r lim T a J2 t2r ~ V(*) (Ar ~ 2)/(Ar) = 

for some constant a > max(^^, 2+ Ar-4r ) ' A > 4 — ^ as in Assump- 
tion 3.2 (ii). In addition, the coefficient function / involved in (3.3) satis- 
fies (3.4). 

Assumption 3.4. (i) The functions g(-) in (1.1) and <#(•) for 1 < I <p 
in (1.2) have bounded and continuous derivatives up to order 2. In addition, 
the function g(-) has a second-order derivative matrix </'(•) (of dimension 
p x p), which is uniformly continuous on W. 

(ii) For each k, 1 < k < p, the weight function {«;(_£)(•)} is uniformly 

continuous on W^ 1 and bounded on the compact support Sw k ^ of Wi^-). 
In addition, E[w^_ k ^{x\- k ^)] = 1. Let SV = <SW;fc = s4 x [— L^L^] be the 
compact support of W(x) = W{x^ k \ x k ) = w^{x^) ■ I[- Lk ,L k ]( x k)- In 
addition, let infa; e s w f(x) > hold. 

Assumption 3.5. The function K{x) is a symmetric and bounded prob- 
ability density function on R 1 with compact support, Ck, and finite variance 
such that \K(x) — K(y)\ < M\x — y\ for x,y 6 and < M < 00. 

When we are estimating the marginal projector P k , the bandwidth b k 
associated with this component has to tend to zero at a rate slower than 
bi for I 7^ k. This means that, for each k, 1 < k < p, we need a separate set 
of bandwidths , . . . , frp^ such that tends to zero slower than b^' for 
all / 7^ fc. Correspondingly, we get p different products b^ = Ylf =1 b\ . Since 
in the following we will analyze one component P k at a time, to simplify 
notation we omit the superscript (k) and write bf., bi,l 7= k, and b n instead 
of bl k , 6j ,/ 7= fc, and b^\ It will be seen that this slight abuse of notation 
does not lead to interpretational difficulties in the proofs. To have consis- 
tency in notation, Assumptions 3.6 and 3.6' below are also formulated using 
this notational simplification. Throughout the whole paper, we use I as any 
arbitrary index, while leaving k for the fixed and specified index as suggested 
by a referee. 

Assumption 3.6. (i) Let b n be as defined before. The bandwidths sat- 
isfy 

lim max bi = 0, 

(m,n)— >oo 1</<P 
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lim mnb^ 2 l T = oo, 

(m,n)— >oo 

liminf mn6 2(r-l)a+2(Ar-2)/((a+2)A) > Q 

for some integer r > 3 and some A > 2 being the same as in Assumptions 
3.1 and 3.2. 

(ii) In addition, for some integer r > 3, the fcth component satisfies 

lim sup mnh\ < oo, 

(m,n)^oo 

hm maxi y fc ^^=o, 

(m,n)— >oo Ofc 

lim mn&^ 2+T ^ 2r ^ = oo. 

(m,n)— >oo 



Remark 3.1. (i) Assumptions 3.1, 3.2, 3.4 and 3.5 are relatively mild 
in this kind of problem, and can be justified in detail. For example, Assump- 
tion 3.1 is quite natural and corresponds to that used for the nonspatial case. 
Assumption 3.2(i) is necessary for the establishment of asymptotic normal- 
ity in the semiparametric setting. As can be seen from Theorem 3.1 below, 
the condition on the existence of the inverse matrix, (B zz }~ x , is required in 
the formulation of that theorem. Moreover, Assumption 3.2(i) corresponds 
to those used for the nonspatial case. Assumption 3.2(ii) is needed as the 
existence of moments of higher than second order is required for this kind of 
problem when uniform convergence for nonparametric regression estimation 
is involved. Assumption 3.4(ii) is required due to the use of such a weight 
function. The continuity condition on the kernel function is quite natural 
and easily satisfied. 

(ii) As for the nonspatial case (see Condition A of [8]), some technical 
conditions are needed when marginal integration techniques are employed. 
In addition, some other technical conditions are required for the spatial case. 
Condition (3.5) requires some kind of rate of convergence for the mixing coef- 
ficient. It holds automatically when the mixing coefficient decreases to zero 
exponentially. For the nonspatial case, similar conditions have been used. 
See, for example, Condition A(vi) of [8]. For the spatial case, Assumption 
3.6 requires that, when one of the bandwidths is proportional to (mn) -1 / 5 , 
the optimal choice under a conventional criterion, the other bandwidths need 
to converge to zero with a rate related to (mra) -1 / 5 . Assumption 3.6 is quite 
complex in general. However, it holds in some cases. For example, when we 
choose p = 2, r = 3, A = 4, a = 31, k = 1, b\ = (mn)" 1 ' 5 and 62 = (mn)~ 2 / 5+r? 
for some < rj < 5, both (i) and (ii) hold. For instance, 

liminf mn& 2(r-l)a+2(Ar-2)/((a+2)A) 
(m,n)^oo 



14 



J. GAO, Z. LU AND D. TJ0STHEIM 



= liminf H^+^^oOO 

(m,ra)— >oo 

and 

lim mnbl +2 / r = lim (mn)' 5 ' 3 '' 1 = oo. 
(m,n)^oo (m,n)— >oo 

(iii) Similarly to the nonspatial case ([8], Remark 10), we assume that 
all the nonparametric components are only two times continuously differen- 
tiable and, thus, the optimal bandwidth bk is proportional to (mn)" 1 ' 5 . As 
a result, Assumption 3.6 basically implies p < 4. For our case, the assump- 
tion of p < 4 is just sufficient for us to use an additive model to approxi- 
mate the conditional mean E[Yij\Yi^ij,Yij^i,Yi + ij,Yij + i] by gi(Yj_ij) + 
# 2 (*i,j-i) +93{Yi+ij) +gi(Y i) j +1 ), with each #;(•) being an unknown func- 
tion. In addition, for our case study in Section 4, we need only to use an ad- 
ditive model of the form gi(X^) + g^{X^ ) to approximate the conditional 

mean, where Xjp = Yi,j-i + Y,j+i and = Y_ij + Yi+i,j- Nevertheless, 
we may ensure that the marginal integration method still works for the case 
of p > 5 and achieves the optimal rate of convergence by using a high-order 
kernel of the form 

J K(x)dx = l, 

(3.6) Jx i K(x)dx = for i = l,...,I-l and 

J x I K{x) + 

for I >2, as discussed in [18] for the nonspatial case, where I is the order 
of smoothness of the nonparametric components. To ensure that the conclu- 
sions of the main results hold for this case, we need to replace Assumptions 
3.4-3.6 by Assumptions 3. 4' -3. 6' below: 

Assumption 3.4'. (i) The functions g(-) in (1.1) and <#(•) for l<l<p 
in (1.2) have bounded and continuous derivatives up to order I > 2. In addi- 
tion, the function g(-) has an /-order derivative matrix g^'(-) (of dimension 
p x p x • • • x p) which is uniformly continuous on MP. 

(ii) Assumption 3.4(h) holds. 

Assumption 3.5'. Assumption 3.5(i) holds. In addition, the kernel func- 
tion satisfies (3.6). 

Assumption 3.6'. (i) Assumption 3.6(i) holds. 
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(ii) In addition, for the fcth component, 

limsup mnb 2I+1 < oo, 

(m,n)— >oo 

Km maxi y fc ^ =o, 

(m,n)— »oo Ofc 
,. ,4(2+r)/(2r-l) 

nm mnbu 

(m,n)— >oo 



for A > 2 and some integer r > 3. 

After Assumptions 3.4-3.6 are replaced by Assumptions 3. 4' -3. 6', we may 
show that the conclusions of the results remain true. Under Assumptions 
3. 4' -3. 6', we will need to make changes at several places in the proofs of 
Lemmas A.3-A.5 and Theorems 3.1 and 3.2. Apart from replacing Assump- 
tions 3.4-3.6 by Assumptions 3. 4' -3. 6' in their conditions, we need to replace 

ELi b l b y Dfc=i b l and M K ) = I u 2 K(u) du by m(K) = J u 7 K(u) du, for 
example, in several relevant places. 

To verify Assumption 3.6', we can choose (remember the notational sim- 
plification introduced just before Assumption 3.6) the optimal bandwidth 
b k ~ (mrt) _1/(2J+1) and 6, ~ (mn)~ 2 ^ 2I+1 ^ , with < ? ? < ^ for all I / k. 
In this case, it is not difficult to verify Assumption 3.6' for the case p > 5. 
As expected, the order of the smoothness I needs to be greater than 2. For 
example, it is easy to see that Assumption 3.6' holds for the case p = 6 when 
we choose a = 31, r = 3, A = 4 and I > 4 + |. For instance, on the one hand, 

in order to make sure that the condition lim^^)-^ maxi <^ fc <p fe ' — q holds, 

we need to have < rj < gjW- On the other hand, in order to ensure that 

hminf mn6 2(r-l)a+2(Ar-2)/((a+2)A) 
(m,n)— >oo 



and 



hminf ( mn )(2/-n)/(2/+i)+(60/ii)r ? =oo>0 

(m,n)— >oo 



lim mnbl +2 l r = lim ( m „)(6/-52)/(3(2/+i))+(25/3)„ = ^ 

(m,n)—>oo (m,n)—*oo 



both hold, we need to assume rj > 25(2/4.1) • Thus, we can choose 77 such that 
25(2/4-1) <V < 2T+1 wnen / > 4+ ~. The last equation of Assumption 3.6' (ii) 
holds automatically when / > 4 + ^ . 

As pointed out by a referee, in general, to ensure that Assumption 3.6' 
holds, we will need to choose rj such that ^ p ^^WT+^T^"^^ <f] < 2TTT' 
which implies that (I,p,r) does need to satisfy I > ^ p ~ 1 2r r+2p . 



16 



J. GAO, Z. LU AND D. TJ0STHEIM 



This suggests that, in order to achieve the rate-optimal property, we will 
need to allow that smoothness increases with dimensions. This is well known 
and has been used in some recent papers for the nonspatial case (see Con- 
ditions A5, A7 and NW2-NW3 of [18]). 

(iv) Assumptions 3.2(h), 3.3 and 3.6 together require the existence 
of £'[|li :) '| 10+<: ] for some small e > 0. This may look like a strong moment 
condition. However, this is weaker than £^[|Yi,-| fe ] < oo for k = 1,2,... and 
£;[ e l y ijl] < oo corresponding to those used in the nonspatial case. 

We can now state the asymptotic properties of the marginal integration 
estimators for both the parametric and nonparametric components. Recall 

that Z*. = Za -ii z - Ef =1 PlA*ii)> Y *j = Y H -VY- £f=i PiSiXy) and 

Theorem 3.1. Assume that Assumptions 3.1-3.6 hold. Then under (3.1), 
(3.7) v^[(/3 -P)- Hp] % N(0, E^g), 

with ^ = (B zz )~ 1 fj, B and Ep = {B ZZ )- 1 Z B ((B ZZ )- 1 ) T , where B zz = 
EZ\ X Z\{, i i B = E[R ij ] andY] B = YT=-ooY J T=-ooE[{Rm-VB){R l] -VB) T ]- 
Furthermore, when (1.2) holds, we have 

t i P = 0^ l3 = (B zz )- 1 E B ((B zz )- 1 Y ) 

where T, B = Y,™- O oT,'jL-oa E [ R 00 R ij]> with R ij = z tj £ ij and £ ij = Y ij ~ 
mo(Xij, Zij) = - // - ZJft - g(Xij). 

Remark 3.2. Note that 

- ^E^(^) = E(^!! (4') - ^^(4')) 
i=i i=i i=i 

= f^P hw {xf^l3) = g a {X ij: l3). 
i=i 

Therefore, Yj* - Z*j T (3 = e i j+g{X ii ) - g a {X^ , (3) , where g(Xij ) - g a {X {i , 0) 
is the residual due to the additive approximation. When (1.2) holds, it 
means that g(Xij) in (1.1) has the expression g(Xij) = J2f=i 9i(xfj) = 

EU P iA X ^P)=9a(X ij ,f3) and ff(*y) = Ef=i Pi%(X$), and hence, 
Y*, - Z*j T /3 = Eij. As f3 minimizes L{0) = E[Yij - m (Xij , Z^)] 2 , we have 
L'((3) = and E^Z^] = £[ey (Z# - ^Zyl-Xy])] = when (1.2) holds. This 
implies E[Rij] = and, hence, ug = in (3.7) when the marginal integration 
estimation procedure is employed for the additive form of <?(•). 
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In both theory and practice, we need to test whether Hq :(3 = (3q holds for 
a given (3q. The case where (5q = is an important one. Before we state the 
next result, one needs to introduce some notation. Let 

, ra n p 



l±B 



mn 



i=l j=l 



EE* 

i=ij=i 



i=i 



ZZ\-1~ 



s /3 =( J B^)- i s B ((s zz )- i r 



in which £b is a consistent estimator of > defined simply by 

E E 



J B 



i=-M m j=-N n 



in 



1 



to— « n—j 



E y2( R uv ~ fiB)(R u +i,v+j ~ fisf, if (1-1) holds, 



mn 



u=l v=l 
m—i n—j 



™„ RuvR u +i,v 



mn 



if (1.2) holds, 



M=l 11=1 



where M m — > oo, iV n — > oo, M m /m — > and N n /n — > as m — > oo and n — > 
oo. It can be shown that both ftp and are consistent estimators of up 
and E^, respectively. 

We are now in the position to state a corollary of Theorem 3.1 that can 
be used to test hypotheses about (3. 

Corollary 3.1. Assume that the conditions of Theorem 3.1 hold. Then 
under (3.1), 



(3.8) S- 1/2 v^p-^)-^]^Ar(0,/ g ) 
and 

(3.9) mn[0 - 0) - Jip] T tf[0 - 0) - fa] ° x\ 
Furthermore, when (1.2) holds, we have, under (3.1), 

(3.10) ^ 1/2 - 
and 
(3.11) 



D 



^y^VmH(P-p)^N(0,I q ) 
(VmH0 - 0)) T Zp\V^0 ~ P)) ° X% 
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The proof of Theorem 3.1 is relegated to the Appendix, while the proof 
of Corollary 3.1 is straightforward and therefore omitted. 

Remark 3.3. Theorem 3.1 implies that there is a big difference between 
the asymptotic variances in the spatial case and in the time series case. The 
difference is mainly because the time series is unilateral, while the spatial 
process is not. Let us consider the simplest case of a line process with p = q = 
1. In the corresponding time series case where Yt = @Yt-i + g(Yt- 2 ) + e t, e t is 
usually assumed to be independent of the past information {Y s , s < t}; then 
with Z t = Y t -i and X t = Y t - 2 , £t = Y t - E{Y t \X tl Z t ) = e t , therefore R t = 
Z^et = Z^e t (with Z\ defined analogously to ZL) is a martingale process 
with E[RoRt] = for t ^ 0, which leads to Eg = E[Rq]. However, in the 
bilateral case on the line with the index taking values in Z 1 where Yt = 
(iYt-i +g(Y t+ i) + e t , et cannot be assumed to be independent of (Yt-i, Y t+ \) 
even when et itself is an i.i.d. normal process and g is linear, since under some 
suitable conditions, as shown in [36], the linear stationary solution may be of 
the form Yt = YlfL-oo a i e t-ji with all dj nonzero. Then with Zt = Yt-\ and 
X t = Y t+1 , e t = Y t - E{Yt\X t , Z t ) + e t , and usually E[R R t ] + for t + 0, 
which leads to S B / E[R$\. 

Next we state the result for the nonparametric component. 

Theorem 3.2. Assume that Assumptions 3.1-3.6 hold. Then under (3.1), 
for x k G [-L k ,L k ], 

(3.12) Vmnb k (P k:W (x k ) - P k:W (x k ) - hias lk ) ^ 7V(0, var lfc ), 
where 

bias lfe = \bliX 2 (K ) J W{ _ k) (x^)f { _ k) (x^)^^dx^ 

and 

var lfc = J / V(x,P)—± — '- \ dx ( K > , 

J fix) 

withJ = jK 2 (u)du, n 2 {K)=fu 2 K(u)du, g(x, (3) = E[(Yij — p— Z[j(3)\Xij = 
x] and V(x,(3) = E[{Y l3 -p- ZJfl - g(x, /3)) 2 |A%- = x]. 

Furthermore, assume that the additive form (1.2) holds and that 
E[w^ k) {x\~ k) )} = 1. Then under (3.1), 



(3.13) 



V 'mnb k {g k (x k ) - g k {x k ) - b\&s 2k ) -> N(0, var 2fc ), 
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where 



bias 2fe = -b k ii 2 {K) — — 



2 ^ ' dx n 
and 

with V(x, (3) = E[{Y %3 -p- ZTj/3 - Yf k=l 9k(x k )) 2 \X l3 = x] . 

The proof of Theorem 3.2 is relegated to the Appendix. We finally state 
the corresponding results of Theorems 3.1 and 3.2 under Assumptions 3.1- 
3.3 and 3. 4' -3. 6' in Theorem 3.3 below. Its proof is omitted. 

Theorem 3.3. (i) Assume that Assumptions 3.1-3.3 and 3. 4' -3. 6' hold. 
Then under (3.1) , the conclusions of Theorem 3.1 hold. 

(ii) Assume that Assumptions 3.1-3.3 and 3. 4' -3. 6' hold. Then under 
(3.1) ; for x k G [-L k ,L k ], 

(3.14) \/mnb k (P Kw {x k ) - P k:W (x k ) - bias lfc (I)) ^ N(0, var lfc (/)), 
where 

U aSlk (I)= 1 -b{p I (K) J u; ( „ fc) (x(- fc ))/ ( _ fc) (x(- fc ))^^dx(- fc ) 

and 

, ll(I) . J f yM) fa(gM ,H), 
J fix) 

with g(x,(3) = E[(Yij -ix- Z\ft)\X i3 - = x], V(x,P) = E[{Y l3 - li - ZJft - 
g(x,j3)) 2 \Xij = x], J = jK 2 (u)du and pi(K) = f u 1 K{u)du. 

Furthermore, let the additive form (1.2) hold and E[w^ k ^{x\- k ^)] = 1. 
Then under (3.1), 

(3.15) V mnb k (g k (x k ) - gk{x k ) -bias 2/c (/)) ^ N(Q, var 2fe (I)), 
where 

d^kixk) 



bias 2fc (/) = -b^K)—-^ 



2 ' dx{ 



and 



var 2fc (/) = J V(x,/3)— — \ dx ( . 

J fix) 

with V(x,0) = E[(Y l3 -li- ZJ,fi - E P k=1 9kixk)) 2 \X l3 = x\. 
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4. An illustrative example with simulation. In this section we consider 
an application to the wheat data set of Mercer and Hall [26] as an illustration 
of the theory and methodology established in this paper. This data set has 
been analyzed by several investigators including Whittle [36] and Besag [1]; 
see also [25] on the analysis from the spectral perspective. It involves 500 
wheat plots, each 11 ft by 10.82 ft, arranged in a 20x25 rectangle, plot totals 
constituting the observations. Two measurements, grain yield and straw 
yield, were made on each plot. Whittle [36] analyzed the grain yields, fitting 
various stationary unconditional normal autoregressions. Besag [1] analyzed 
the same data set, but on the basis of the homogenous first- and second- 
order auto-normal schemes [see (5.5) and (5.6) in [1], page 206], and found 
that the first-order auto-normal scheme appears satisfactory ([1], page 221). 
This model has the conditional mean of Yij, given all other site values, equal 
to 

(4-1) 70 + 7i(*i-lJ + Y i+i,j) + 72(*ij-i + Y iJ+1 ), 

where we use to denote the grain yield, and 70, 71 and 72 are unknown 
parameters. For more details, the reader is referred to the above references. 

As a first step, we are concerned with whether or not the first-order scheme 
is linear as in (4.1) or partially linear as in (1.2). This suggests considering 
the additive first-order scheme 

(4-2) » + 9l (xff)+g 2 (xg\ 

where X^' = Yi-\j + Y i+ ij, xj?' = 3^j-i + ^j+i, M is an unknown pa- 
rameter and gi(-) and <?2(") are two unknown functions on R . If the Besag 
scheme is correct, both (1.1) and (1.2) hold and are linear, and one can 
model (4.2) as a special case of model (1.2) with f3 = 0. 

Next, we apply the approach established in this paper to estimate g\ and 
g2 - In doing so, the two bandwidths, b\ = 0.6 and 62 — 0.7, were selected using 
a cross-validation selection procedure for the case of p = 2. The resulting 
estimated functions of gi(-) and 52(0 are depicted in Figure 1(a) and (b) 
with solid lines, respectively, where the additive modeling, based on the 
modified backfitting algorithm proposed by Mammen, Linton and Nielsen 
[24] in the i.i.d. case and developed by Lu et al. [23] for the spatial process, is 
also plotted with dotted lines. We need to point out that, in an asymptotic 
analysis of such a two-dimensional model, two bandwidths tending to zero 
at different rates have to be used for each component, thus, we will need to 
use four bandwidths altogether. But in a finite sample situation like ours, 
we think that it may be better to rely on cross-validation. This technique is 
certainly used in the nonspatial situation too, even in cases where an optimal 
asymptotic formula exists. 



SEMIPARAMETRIC SPATIAL REGRESSION 



21 



(a) 




6 7 S 9 10 



Fig. 1. Estimated functions of semi-parametric first-order schemes: (a) gi(x), (b) g2{x). 
Here the solid and the dotted lines are for the estimates of the additive first-order scheme 
based on the marginal integration developed in this paper and the modified backfitting in [24] 
and [23], respectively; the dashed line is for the estimates of the partially linear first- order 
scheme based on the approach developed in this paper. 
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The pictures of the additive first-order scheme indicate that the estimated 
function of gi(-) appears to be linear as in [1], while the estimated function 
of <?2( - ) seems to be nonlinear. This suggests using a partially linear spatial 
autoregression of the form 

(4.3) fo + fcxV+g^xf). 

For this case, we view model (4.3) as a special case of model (1.2) with ft = 

Po, f3 = Pi, Zij = xj-j' , Xij = X^' and g(-) = g2(-)- Based on the bandwidth 
of 0.4 selected using a cross-validation selection procedure, the resulting 
estimates were /3q = 1.311, f3\ = 0.335 and <?2(")i which are also plotted in 
Figure 1(a) and (b) with dashed lines, respectively. 

We find that our estimate of (5\ based on the partially linear first-order 
scheme is almost the same as Besag's first-order auto- normal schemes, which 
are tabulated in Table 1 below. The estimate of g2(-) based on the partially 
linear first-order scheme, similarly to that given in Figure 1(b) based on 
both the marginal integration and the backfitting of the additive first-order 
scheme, indicates nonlinearity with a change point around x = 7.8. 

One may wonder whether the apparent nonlinearity in 52 could arise from 
random variation even if gi is linear. The similarity of the two estimates us- 
ing different techniques is reassuring, but we also did some simulations with 
samples from the auto-normal first-order scheme with conditional mean (4.1) 
with 70 = 0.16, 71 = 0.34, 72 = 0.14 and with constant conditional variance 
a 2 = 0.11, where the values of the parameters were chosen to be close to the 
estimated values of the auto-normal first-order scheme for the grain yields 
data given by Besag's [1] coding method. The sample size in the simulation 
is the same as that of the grain yields data, that is, m = 20 and n = 25. 
We repeated the simulation 100 times. For each simulated realization, our 
partially linear first-order scheme of (4.3) was estimated by the approach de- 
veloped in this paper with the bandwidth of 0.4 (the same as that used for 
the grain yields data in the above). The boxplots of the 100 simulations for 
the nonparametric component 52(0 are depicted in Figure 2. A six-number 
summary for /3i is given in Table 2. 



Table 1 

Estimates of different first-order conditional autoregression schemes for Mercer and 

Hall's data 



Scheme 


Regressor: -Xy 


(2) 

Regressor: Xh 


Variance of residuals 


Partially linear 


$1 = 0.335 


&{■)■ Figure 1(b) 


0.1081 


Auto- normal ([1], Table 8) 


71 = 0.343 


72 =0.147 


0.1099 


Auto- normal ([1], Table 10) 


71 = 0.350 


72 =0.131 


0.1100 
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x2 

Fig. 2. Boxplots of the estimated partial linear first- order scheme for the 100 simulations 
of the auto-normal first-order model for the nonparametric component g2(x). The sample 
size is m — 20 and n = 25. 



It is clear that the estimate for fix is quite stable with median almost equal 
to the actual parameter 0i = 0.34, and the estimate for g<i also looks quite 
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Table 2 
A six-number summary for j3\ 



Min. 


1st Qu. 


Median 


Mean 


3rd Qu. 


Max. 


0.2313 


0.3129 


0.3405 


0.3387 


0.3684 


0.4182 



linear with small errors around x = 7.8. The simulation results show that 
it is unlikely that the estimated nonlinearity in g<i for the grain yields data 
in Figure 1(b) should be caused by random variations with the true model 
being linear. In fact, the accuracy of our estimates is quite high around 
x = 7.8, since the samples of the grain yields are quite dense there (see 
Figure 3). 

Table 1 reports the variance of the residuals of the partially linear first- 
order scheme, as well as of Besag's auto-normal schemes. By contrast, the 
partially linear first-order scheme gives some improvement over the auto- 
normal schemes, but perhaps surprisingly small in view of the rather pro- 
nounced nonlinearity of Figure 1. In an attempt to understand this, we also 
calculated the variances of the estimated components and the variance of 
Yij over :2 < i < 19,2 < j < 24}, reported in Table 3. By combining 

Table 3 with Table 1, we can see the following: (a) clearly, for the par- 
tially linear first-order scheme, as well as Besag's auto-normal schemes, the 
variances of the residuals (in Table 1) are quite large, all about half of the 
variance of Y^ (given in Table 3); (b) the variances of the first component, 

V&r{gi(X^)}, are much larger (6 times) than those of the second Compe- 
ls 

nent, Varj^^ij )}> arid therefore, the first components in the fitted con- 
ditional means play a key role, while the impact of the second components 
is smaller; and (c) if we are only concerned with the estimate of the sec- 
ond component 32, then the improvement of the partially linear first-order 
scheme over the auto-normal schemes is clear if measured in terms of the rel- 
ative increase of the variance: (0.0114-0.0102)/0.0102 x 100% = 11.76% and 
(0.0114 - 0.0081)/0.0081 x 100% = 40.74% (cf. Table 3). These facts serve 
at least as tentative explanations of the slightly contradictory messages of 



Table 3 

Variances of components of different first-order conditional autoregression 
schemes for Mercer and Hall's data 



Scheme 




Var{ gi 


Var{g 2 (x(f; 


Partially linear 


0.205 


0.0661 


0.0114 


Auto-normal ([1], Table 8) 


0.205 


0.0693 


0.0102 


Auto-normal ([1], Table 10) 


0.205 


0.0722 


0.0081 



SEMIPARAMETRIC SPATIAL REGRESSION 



25 



to 
o 



to 
o 



CO 
C 
CD 



CM 

d 



o 

o 




5 



I I Dill 



I II III II II 



I 

7 



■■lllllllll 



10 



11 



Fig. 3. The estimated kernel density of X\ - defined in (4.3) for the grain yields data. 



Figure 1 and Table 1. The partially linear scheme provides an alternative 
choice of fitting and conveys more information on the data. A referee sug- 
gested that the apparent nonlinearity may be due to an inhomogeneity in 
the data (cf. [25]). This is a possibility that cannot be ruled out. Also, for 
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time series it is sometimes difficult to distinguish between nonlinearity and 
nonstationarity. 

5. Conclusion and future studies. This paper uses a semiparametric ad- 
ditive technique to estimate conditional means of spatial data. The key idea 
is that the semiparametric technique is employed as an approximation to the 
true conditional mean function of the spatial data. The asymptotic prop- 
erties of the resulting estimates are given in Theorems 3.1-3.3. The results 
of this paper can serve as a starting point for research in a number of di- 
rections, including problems related to the estimation of the conditional 
variance function of a set of spatial data. 

In Section 4 our empirical studies show that the estimated form of <72 
is nonlinear. To further support such nonlinearity, one may need to estab- 
lish a formal test. In general, we may consider testing for linearity in the 
nonparametric components <#(•) involved in model (1.2). 

In the time series case, such test procedures for linearity have been studied 
extensively during the last ten years. Details may be found in [10]. In the 
spatial case, Lu et al. [23] propose a bootstrap test and then discuss its 
implementation. To the best of our knowledge, there is no asymptotic theory 
available for such a test, and the theoretical problems are very challenging. 

To test Hq : gk(xf^) = X^jk, where {7^} is an unknown parameter for 
each given k, our experience with the nonspatial case suggests using a kernel- 
based test statistic of the form 

m n m n 
*1=1 jl=l «2 = 1,7^1 J2 = l,^il 

x (i) _x m . 

where Ki 1 j 1 (Xi 2 j 2 ,b) =Yl^ =1 K( %in b; ' 2J2 ), as defined at the beginning of 
Section 2, and e\f = Y {j -p- ZJft - x\f% - J2i=i,^k9i(X^), in which p, 

(3, 7fc and <#(•) are the corresponding estimators of fj,, (3, 7fc and gi(-). These 
estimators may be defined similarly as in Section 2. 

Our experience and knowledge with the nonspatial case would suggest 
that the normalized version of Lk should have an asymptotically normal 
distribution under Hq, although we have not been able to rigorously prove 
such a result. This issue and other related issues, for example, a test for 
isotropy, are left for future research. 

APPENDIX: PROOFS OF THEOREMS 3.1 AND 3.2 

Throughout the rest of the paper, the letter C is used to denote constants 
whose values are unimportant and may vary from line to line. All limits are 
taken as (m,n) — > 00 in sense of (3.1) unless stated otherwise. 
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A.l. Technical lemmas. In the proofs we need to repeatedly use the fol- 
lowing cross term inequality and uniform-consistency lemmas. 

Let /(_&)(•) and /(•) be the probability density functions of x\- ^ and 
Xij, respectively. For k = l,2,...,p and s = 1, 2, . . . , q, let 

^(x fc ) = /(X^ fc) , a ; fc )- 1 U ;(xi- fc) )/(- fc )(^ fc) ), 
Lemma A.l. (i) Let Assumptions 3.1-3.6 hold. Then under (3.1), 

-i m n 

£ £ Ay (x k ) B> Ar(o, varg), 



v «, l= j j = x 



where 



in which J = J K 2 (u)du, V^ s \x) = E((z\f - ^ - H^(x)) 2 \Xij = x) and 

x(~ k ^ is the (p — 1)- dimensional vector obtained from x with the kth compo- 
nent, x k , deleted. 

(ii) Let Assumptions 3.1-3.6 hold. For any (m,n) G I? , define two se- 
quences of positive integers c\ = c\ mn and C2 = C2 mn such that 1 < c\ < m 
and 1 < C2 < n. For any x k , let 

m n m n 

(a.i) j( Xk ) = Y,Yl E E^a^a^c**)], 

i=l j=l i'=l j'=l 



(A.2) 



j ,(Ar-2)/(Ar+2)+l 
J\ = c\C2mnb k , 

V m 2 +n 2 



(V m 2 +n 2 
E *¥>(0 ( 
i=minfci .co") 



^j(Ar-2)/(Ar) 

■ j=min(ci,C2) 

where C > is a positive constant and A > 2 and r > 1 are as defined in 
Assumptions 3.1 and 3.2(h). Then for any x k , 

(A.3) |J(s*)|<C[Ji + J 3 ]. 

Proof. The proof of (i) follows similarly from that of Lemma 3.1 of [16], 
while the proof of (ii) is analogous to that of Lemma 5.2 of [16]. When 

(s) 

applying Lemma 3.1, one needs to notice that E[e\j ] = and N = 2. For 
the application of Lemma 5.2, we need to take 5 = Xr — 2, d = l and N = 2 
in the lemma. □ 
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Lemma A. 2. Let e Z 2 and & 



x 1 )/b 1 ,...,(X^ 



x p )/bp)6ij, where K(-) satisfies Assumption 3.5, and Oij = 6(Xij,Yi 3l , 



i.i 



in 



which #(•,•) is a measurable function, satisfy E[£ij] = and E[\6ij\ ] < oo 
for a positive integer r and some A > 2. In addition, let Assumptions 3.1- 
3.6 hold. Then there exists a constant C depending on r but depending on 
neither the distribution of £ij nor b n and (m,n) such that 

m. r) \ 2r - 



L\;=ij=i / 



< C(mnb n ) r 



(A.4) 

holds for all p sets of bandwidths. 

Proof. The proof of this lemma follows from that of Lemma 6.2 of [11]. 

□ 

Lemma A. 3. Let {Yij,Xij} be an M 1 x W -valued stationary spatial pro- 
cess with the mixing coefficient function <p(-) as defined in (3.3). Set Oij = 
9(Xij,Yij) and R(x) = E(6ij\Xij = x). Assume that E\9ij\ Xr < oo for some 
positive integer r and some A > 2, and that Assumptions 3.1-3.6 hold. Let 
R(x) and f(x) be twice differ entiable with bounded second-order derivatives 
onW. Then 

m n p 



(A.5) 



sup 

x£Sw 



(mnb 7T )- 1 J2Y,^Il K (( X 

i=ij=i i=i 



x l )/b l )-f(x)R(x] 



Op (mn^ +2 /T r/(p+2r) + b 



k=l 



holds for all p sets of bandwidths. 

Proof. The lemma follows from Lemma A. 3 of [11]. □ 

Lemma A.4. Let U m ^ n be as defined in (2.4). Suppose Assumptions 3.1, 
3.2 and 3.4 hold. In addition, ifb n ^0 and mnb n — > oo, then uniformly over 
x £ Sw, 



(A.6) 



U = f{x) 



1 T 

lMi(K)I p 



where = (0, . . . , 0) T G W , ^{K) = / u 2 K{u) du, I p is an identity matrix of 
P 

order p and — ► denotes convergence in probability. 

Proof. The proof follows from Lemma A. 3. Its details are available 
from the proof of Lemma 6.4 of [11]. □ 
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A. 2. Proofs of Theorems 3.1 and 3.2. To prove our main theorems, we 
will often use the property of the marginal integration estimator, which is to 
be established here and is of independent interest in some other applications. 

Let H^ s \x) = E[(Z^ — /j,^)\X = x] be the conditional regression of 
z\f - $ given X tJ = x, P { k %(x k ) = E[H^{x\~ k \ x k )w { _ k) {x\- k) )} the 

weighted marginal integration of H^ s \x), and Ha\x) = X)fc=i -Ffc w( x k) the 
additive approximation of H ^ (x) based on marginal integrations, for s = 
0,1,..., q. The estimates of these functionals were given in Section 2. Let 
W{x) and Sw be as defined in Lemma A. 3. The following lemma is necessary 
for the proof of the main theorems. 



Lemma A. 5. Suppose Assumptions 3.1-3.5 hold and the bandwidths sat- 
isfy mnb k = 0(1), J2f=i i^k $ = Then under (3.1), 

(A.7) V^(^W " 4%^k) ~ biasg) £ N(0, varg), 
where 



94 



J j (x) 

in which ^(K) = J u 2 K(u)du, and the other quantities are as defined in 
Lemma A.l. 

Let H { k s \x k ) = E[(z\f - nf)\x[f = x k ]. Furthermore, if H^(x) = 
X^Ui-Hfc ( x k) and E{w(_ k )(X^ k ^)] = 1, then under (3.1), 

(A.8) V^h{Pi%{x k ) - H { k s \x k ) - biasg) 3 N(0, vax$), 
where 

w 1 2 (vS &H%\x k ) 

blaS 2fc = oKMK) 



2 ' dx{ 



and 



var 



^ere vM(x) = S[(zW - M « _ e p =j # «( Xfc ))2|x<. = x] . 
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Proof. By the law of large numbers, it is obvious that, for x k € [—L kl L k \ ■ 

m n 



i=l j=l 

(A.9) 

_ p(s) - - „ / 1 



Throughout the rest of the proof, set 7 = (1, 0, ... , 0) T £ R 1+p . Note that, 
by the notation and definitions in Section 2, 

H$ n (xt k \ Xk )-HM(xt k \ Xk ) 

(A.10) = 7 ^-Ux^\x fc )yWj4r fe ), Xfe )-ijW(4r fc ) )2;fe ) 

= 7 ^m,n {p^ij i x k) B m ,n {-^ij ) X k)i 

where DH^(x) = (dH^(x)/dx 1 , . . . , dH^ s \x)/dx p ) with x= (x^ k \x k ), 
the symbol is as defined in (2.5) and 

i!n,o(*) " « m ,n,Oo(x)F«(x) - ^.oi (z) (x) 6) T \ 

( li 0«0 - l7m,n,io W - Cm,n,n (x) (s) (x) 6) T / 



(A.ll) 



r 



„ Bm,n,l (x) 

Therefore, by the uniform consistency in Lemma A. 4, for £ [— L k ,L k \, 

m n 

i=i j=i 

(A.12) 

m n 

i=ii=i 



+ P (4n)(mn) 1 ££ J B mi „ i o(A^. fc) , x k )w^ k ) (X 
i=ij=i 

where d mn = (mn&i +2 / r )- r /( p+2r) + £f =1 fy 2 . Note that 

i'=i j'=i v 
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^-Q^-(x)(X^ t -x e )jK Vjl (x,b) 



(A. 13) =(mn& 7r ) 1 ^2 ^2 ■q i/j/ (x)K i > j >(x,b) 

i'=ij'=l 

m n 

- (Z (s) - vgXmnbJ- 1 £ E K Vj >{x,b) 

i>=\j<=i 

= -^m,n,o( x ^ i x k) + B m n (x^ \x k ), 

where mr (x) = Z$ - $ - !£«(*) - £f =1 - *,). 

Clearly, the result of Z s — fj^ = Op( -^= ) together with the uniform 
consistency in Lemma A. 3 leads to 



'run, 

which holds uniformly with respect to x = (x^~ k \x k ) £ Sw- Now it follows 
from (A.12)-(A.13) by exchanging the summations over and (it ,f) that 

m n / v( k ) _ _ \ 

(A ' 14) 

m n f X X \ 

+ P {c mn )(mnb k y l £ £ f U B^^fc) 

i'=lj'=l \ ° k J 

+ o P / 



>mn 



where ^ (* fc ) = E^i E"=i r 1 < x *>("*) (4"* W (4^ > 

x k )K$), and fl$>(a; fc ) = £"=i ^(^W^' 



(0 v(0 



• J 



x k )K.[J)„ in which 6(- fe) =nf =W fe^ and k\J), = Y? l=im K( B[ - 

Recall e[f = z\f - $ - iT^^y) = z\f - E(Z$\Xij). Note that the 
properties (compact support) of the kernel function in Assumption 3.5 show 
that, if Ktfy > and K((X$ - x k )/b k ) > in (A.14), then \xf jt - X<§\ < 

(k) 

Cbi — > for I / k and \X^j, — x k \ < Cb k — > 0, as m — ► oo and n — ► oo. There- 
fore, if K\j$, > and K{{xf], - x k )/b k ) > in (A.14), then by Taylor's 
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expansion (around Xij) together with the uniform continuity of second par- 
tial derivatives of <?(•) in Assumption 3.4, 



dxi v " 



E 

l=l,l^k 

^xV^ 



(Xt k \ xh )(x$ - x k ) 



>) + I 



d *H^{x\j k \x k ) 



(4? - x k y 



+ 



0(1) 



E + E b ibk + b 2 k 

Ll,l'=l,^k l=l,^k 



+ 5 £ 



dxi dxy 



0{hb v ) 



Then under > and - x k )/b k ) > 0, 

m n 

i=lj=l 

1 m n 



i=lj=l 



m n 

+ - £ 0(6i^){mn6 ( _ fc) }- 1 ££r 1 (^"* ) ,a; fc )«'(- fc )(^ 

i=ii=i 



x W^,^) ^) 
dxv 



m n 

( X (~k) 

--l,^k i=lj=l 



+ £ 0(6 i 6 fc ){mn& ( _ fc) }- 1 ^^r 1 (xi- fc) ,x fc )«; ( _ fe) (x{ 
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d 2 H (s) (X (-V jXk) 



dxi dx k 



K (-k) 



(\ E h ^'+ E hb k + bl) -o(l) 



X 



mnb 



m n 



Again, using the uniform consistency in Lemma A. 3, we have 



B fj'{ x k) = di'j'k(x k ) 



■« ^H^{xtf\x k )^ 



+ 0p(4" fc) ) 



E 0(6,6,0 



(A.15) 



,,,'=1,^ 
E 0(6,6 fc ) 
'l 



^'i'fc (** ) dx^8x~ V + P( C mn > , 



^,(x fc ) ^ + Op(cL^) 



+ - £ o(l)&i6|/ + E o(l)&^ + o(l)6 2 fc 
\ z ,,,'=i,^fc l=i,j=k ) 

x [d i /j/ A .(x Jfe ) +Op(c^)], 

where = /(X^ fc ), Xfc )-i W( _ fe) (xf fc ))/ ( _ fc) (xfr fc )). 

In addition, denote by 

d* 3k (x k )^w^ k) {x\- k) )f { _ k) {x\j k) ) and tf 6fc (x fc ) = ^(g) 

Then similarly to (A.15), 



B*,f(x k ) = d*, jlk (x k ) ef jt + U X W - Xk ) 



^H^{xtf\x k ) 



dxl 



(A.16) +Op(cL fc) ) 



1 i,i'=i,^k 



d 2 H( s \x~r,x k ) 

dlf k (x k ) + Op(cL fc) ) 
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p r d 2 H^(X^~ k ^ x ) 

+ £ 0{hb k ) [4 fk (x k ) +Op(cL fc) ) 

+ (| E E o(l)6i&fc + o(l)^ 

\ i,i'=i,^k i=i,^k I 

x KYfc(^fc) + Op(c(- n fc ))]. 

Therefore, by (A.14)-(A.16), 

(A.17) =T^ + P (c mn )T<^ + P (l) £ bf 

+ o P (i) ]T b e b k + op(i)bl + o P (i) 



'111)1 



where 



m n , — Ti \ 



i=lj=\ 



'» » /X- — 
+ (mn^)" 1 ^^^ — ^- 1 d ijk (x k ) 

(A.18) i=ij=i V k ' 



'j 



l (X^ x? d 2 H^{x\- k \ Xk ) 
2 [Xi i Xk} dx\ 



rp(k) . rp{k) 

mill "i - mn2' 



and Tmn can be expressed similarly to (A.18) with dij k (x k ) replaced by 
d* jk (x k ). 

We next consider and t£J 2 . Clearly, JSp^] = since £ , (e^ ) |X ij ) = 
0. We calculate the asymptotic variance of T^. Note that 

(A.19) J B[T,S 1 ] 2 = J 1 (x fc ) + J2(^), 

where 

( fc ) „ x 

Ji(x fc ) = (mr^EE^ IT 2 ) 4- fc (x fc )(e 

i=lj=l L V / 
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m n m n 

1=1,3=1 i'=l j'=l 
i'^i or j'^j 

in which Aij(x k ) = K{{X^' — x k )/b k ) dij k {x k )^fj . A simple calculation im- 
plies 



Ji(x k ) = -\jE[d% k (x k )e%\x\f = x k ]f k (x k ){l + o(l)) 



(A.20) 



where 



mnb k 



mnb 



-(l + o(l))C k (J,V), 



c k (j, v) = j />) (3 q K- fe )(^ ( - fc) )/(- fe )^ ( - fc) )] 2 dg( - fc)) 

in which J = / K 2 (u) (in, = £[(eg } ) 2 |A%- = x], and / fc (x fc ) is the den- 

sity function of X^- . To deal with the cross term J2{x k ), we need to use 
Lemma 6.1. Under the assumptions of the lemma, it leads to 



Ji(xk) < C(mnb k ) 1 
(A.21) 



,(Ar-2)/(Ar+2) 

6 fc C 1 C 2 



+ 6. 



-(Ar-2)/(Ar) £ t { ¥? (t)}(Ar-2)/(Ar) 

\ t=min{ci ,£2} ■ 



Take ci = C2 = [6 



_ r ,-(Ar-2)/(oAr) 1 



where [it] < it denotes the largest integer 



part of it. Then since a > 2(Ar + 2)/Ar in Assumption 3.3, 
and it hence follows from (A.21) and Assumption 3.3 that 



2(Ar-2) Ar-2 
aXr ^ Ar+2' 



hixk) < C(mnb k ) 



-1 



(A.22) 



6 (Ar-2)/(Ar+2)-(2(Ar-2))/(aAr) 



+ctE%W} (Ar - 2)/w 

i=ci 



= o((mnb k ) 1 ), 

using cf Et= Cl t{^(t)} (Ar ^ 2)/(Ar) < cf Et= Cl * ap - 1 {¥>(*)} (Ar ~ 2)/(Ar) ^ by As- 
sumption 3.3. 

Now the asymptotic variance of Z^|i, using (A. 19), (A.20) and (A.22), 
equals the right-hand side of (A.20), that is, 



(mnh k )E[T^ - J 



r (, (l) h-.)(''-"^.(-'-")f fc H, 



36 

(A.23) 



J. GAO, Z. LU AND D. TJ0STHEIM 



Next, we consider the term T^ 2 m (A. 18). From (A. 18), together with 
the property of the kernel function in Assumption 3.5, 



1. 



i mn2 — 2 



dxk" 1 



fk(x k )V2(K) + P 



mnJ u k 



= blMK)f k (x k ) J W{ _ k){x i-k)f9(^\x k )^_ k) + Qp{bl) 

= bias$ +op(6fe), 

where = (mn^ +2/r )" r/(1+2r) + 6g and /i 2 (^) = Ju 2 K(u) du. 

Similarly, one can show Tmli = Op(l/ 'y 1 mnb k + 6^). Based on the condi- 
tions, mn&| = O(l) and X)f=i ^fc^l = (^I)j the remaining terms in (A. 17) 
can be neglected since 

= (1 + btV^h) ((mnbl +2 / r r r/iP+2r) + E bf" 



\J mnbkCr, 



yj mnb 



1=1 



0. 



l=l,^k 



mnbi 



E * 



1/2 



0. 



1/2 



Vmnb k E = OC 1 ) mnb l E 6 ? 







l=l,^k 



1=1, \ 

and J mnbi. } = bV 2 — > 0. 
v sjmn & 

Therefore, in view of what we have derived, to complete the proof of (A. 8), 



it suffices to show that \J mnb k T^ nX —> N(0, var^), which follows from Lem- 
maA.l(i). □ 



Proof of Theorem 3.1. We note that 

(1 m n \ — 1 / m n \ 

-E£%%j (-££%^-%*)l 
1=1 3=1 / \ 1=13=1 / 

— I nZZ\-lnZY 

— \ JD mn) D mn' 

Denote by H ( a s \x) = £f =1 P£fe) and H a (x) = ZLi P l%( x l) the addi ~ 
tive approximate versions to H^(x) = E[{z\j — ^ )\Xij = x] and H{x) = 
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E[(Zij - Hz)\Xij = x], respectively, and by Hi% n (x) = Ya=\ ]( x k) and 

H a ,mn{x) = Z)f=i Pi Z w ( x i) the corresponding estimators of Hi s \x) and H a (x). 
Then we have 



-, m n -i m n 

1=1 J=l 1=1 J = l 

1 m n -i m n 

(A-25) • -VVA?'" (Z*.) t + — E E A ^ a A « aT 



mn r-f r-* J J mn . , . , 

r=i j=i i=i j=i 



= R zz 

/ y mn,l 
k=l 

where Z>* = Z^- - tf a (A%) and Ag a = H a (Xij) - ff 0>mri (^). Moreover, 

-. m ra -. /ran 

^ = ^ E E ^4 + ^ E E ^(aS? - aj-^ 



1 = 1 J = l «=lj=l 



(A-26) + ^ E E a?- 4- + ir n E E [a§> - (a? - r 



1=1 J =1 i=l j=l 

=E*i 



where e|- = Yj* - Z^- 1 "/?, Z*- and Yfi = %j - H^\Xij) are as defined in 
Assumption 3.2(i) and Theorem 3.1, and A^y = Ha \Xij) — Ha%, n (Xij). 
So, to prove the asymptotic normality of (3, it suffices to show that 

(A.27) B^ Z ^B ZZ , ,/^(B% - n B ) Z N(0,V B ), 

where B zz , ub and are as defined in Theorem 3.1. To this end, we need 
to have 

m n 

EE - p kl^)f = op{V^i 
i=i j=i 

(A.28) 

s = 0,l 



5 5***7 



This is ensured by the following facts: due to (A. 17), together with Lemma A. 3 
for p = l, 

^P \P&(xk) ~ P&(x k )\ 

x k £[—L k ,L k ] 
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P ((mnbl + ^r r/{1+2r) +bl)+0 P (l) ± bf 



P {\) J2 kbk + o P (l)bl + Op(l) 



1 



and owing to mnb^ 2+r ^^ 2r ^ — ► oo for some integer r > 3 and mnb\ = 0(1), 

^((mn^ +2/r )^ r/(1+2r) +b\) 2 

= C((mn)-( 2r - 1 )/( 1 + 2r )&- 4(2+r)/(1+2r) + mnbl) 1 ' 2 

->o, 



V P / -y 



'ran 



I ma 



o P (i) J2 b f + o P (i) E hb k + op{i)bl + o P {i) 

\ l=l,^k l=l,^k 

-►0. 
Thus, 

m n m n / p \ 2 

E E(4 S) ) 2 = E E E - 

(A. 29) t=ij=i i=i i=i \fc=i / 

= op(yJmn). 

Therefore, using the Cauchy-Schwarz inequality, it follows that the (s, t)th 
element of 4 satisfies 

m n 

bZ a (s, *) = ^ E E A «' A « 

i=i j=i 

-i / m n \ 1/2 / m n \ 1/2 

^EE<a<?) 2 EE(AgV -HD, 

\r=lj=l / \j=ij=i / 

and similarly 

= op(1), B™ 3 (s,t) = o P (l). 

Now since J3^ ! -> J B[Z 1 * 1 Z 1 * 1 T ] in probability, it follows from (A.26) that the 
first limit of (A.27) holds with 5 ZZ = E\Zl x Z\^\. To prove the asymptotic 
normality in (A.27), by using the Cauchy-Schwarz inequality and (A. 29), 
we have 

4 
k=2 
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Therefore, the second limit of (A. 27) follows from (A. 26) and 

^ m n 

V^^mn.i - /is) = —/= E E[ Z *i4 ~ A»b] ~> N ( > S ^), 

V 1=1 J=l 

with fjL B = E[IUj] and S B = E£_oo ^[^oo^], where R i3 = Z*^. 

The proof of the asymptotic normality follows directly from the central limit 
theorem for mixing random fields (see Theorem 6.1.1 of [20], e.g.). When 
(1.2) holds, the proof of the second half of Theorem 3.1 follows trivially. □ 



Proof of Corollary 3.1. Its proof follows from that of Theorem 3.1. 

□ 



Proof of Theorem 3.2. Note that 
given in (2.12) and that P k , w {x k ) = Pj®(x k ) - /3 T P^(z fe ). Then 

P k,wi^k) Pk,w{^k) 

= [%%(xk) - p£l(xk) - FihU^) - p L^))\ -0- pypIm 

Pmn,l Pmn,2 \&k) • 

For any c = (c , C{) T £ R 1+q with C 1 = (c u ..., c q ) T £ R q , we note that, 
for x k £ [-L k ,L k ], 

E c sP { k %{x k ) = coPi%(x k ) + Cipz w {x k ) 

= E[g**(x£ k \ Xk )] W( _ k) (X^), 

where g**{x) = E[Y£*\X tj = x] with Y** = co(Y-j ~ I*y) + C{(Z i:j - fi z ), and 
similarly, 

E *Pk'l(*k) = coP^Jxk) + ClP k z w (x k ) 

s=0 

-. m n 

=— EE 9ZAx^ k \ Xk ) W{ „ k) (xjr fc) ), 
t=i j=i 

where g^ n (x) is the local linear estimator of g**(x), as defined in Section 2 

with Y*? = coYij + C{ Zij instead of there. Therefore, using the argument 
of Lemma A. 5, the distribution of 

(A.30) ^E^S^-^W) 

s=0 
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is asymptotically normal. 

Now taking cq = in (A. 30) shows that P^ w (x k ) — > Pk w (%k) hi probabil- 
ity, which together with Theorem 3.1 leads to 

(A.31) V^hPmnA X k) = V^h(P ~ P) T P£ w (x k ) = Op(Vh) = P (1). 

On the other hand, taking c$ = 1 and C\ = —(5 in (A. 30), we have 
mnb k Pmn,i{xk) 

(A.32) 

= ^h[P^ w (x k ) - P®(x k ) - P T (P k Z w (x k ) - P k Z w (x k ))] 

are asymptotically normal as in (A. 8), with Y** = Yij — fj,y — f3 T (Zij — fj,z) 

and g**(x) = E(Y**\Xij = x) instead of H^ s \x) and z\y in Lemma A. 5, 
respectively. This finally yields Theorem 3.2. □ 
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